Seed-Detective: A Novel Clustering Technique Using High Quality Seed for K-Means on Categorical and Numerical Attributes

نویسندگان

  • Md Anisur Rahman
  • Md Zahidul Islam
چکیده

In this paper we present a novel clustering technique called Seed-Detective. It is a combination of modified versions of two existing techniques namely Ex-Detective and Simple K-Means. Seed-Detective first discovers a set of preliminary clusters using our modified Ex-Detective. The modified Ex-Detective allows a data miner to assign different weights (importance levels) for all attributes, both numerical and categorical. Centers of the preliminary clusters are then considered as initial seeds for the modified Simple K-Means, which unlike existing Simple K-Means does not randomly select the initial seeds. Centers of the preliminary clusters are naturally expected to be better quality seeds than the seeds that are chosen randomly. Having better quality initial seeds as input the modified Simple K-Means is expected to produce better quality clusters. We compare Seed-Detective with several existing techniques including Ex-Detective, Simple KMeans, Basic Farthest Point Heuristic (BFPH) and New Farthest Point Heuristic (NFPH) on two publicly available natural data sets. BFPH and NFPH were shown in the literature to be better than Simple K-Means. However, our initial experimental results indicate that Seed-Detective produces better clusters than other techniques, based on several evaluation criteria including F-measure, entropy and purity. Another contribution of this paper is the experimental result on Ex-Detective which was never tested before. .

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ModEx and Seed-Detective: Two novel techniques for high quality clustering by using good initial seeds in K-Means

Clustering; Classification; K-Means; Cluster evaluation; Data mining Abstract In this paper we present two clustering techniques called ModEx and Seed-Detective. ModEx is a modified version of an existing clustering technique called Ex-Detective. It addresses some limitations of Ex-Detective. Seed-Detective is a combination of ModEx and Simple KMeans. Seed-Detective uses ModEx to produce a set ...

متن کامل

A New Clustering Algorithm for Categorical Attributes

Clustering over categorical attributes is an important yet tough task. In this paper, we present a new algorithm K-meansII to extend the famous K-means algorithm which is efficient only on numerical clustering, by using new cluster center definitions and new similarity measures. Thus, our algorithm can be used in categorical clustering while preserving the efficiency. Experiments on both real-l...

متن کامل

CRUDAW: A Novel Fuzzy Technique for Clustering Records Following User Defined Attribute Weights

We present a novel fuzzy clustering technique called CRUDAW that allows a data miner to assign weights on the attributes of a data set based on their importance (to the data miner) for clustering. The technique uses a novel approach to select initial seeds deterministically (not randomly) using the density of the records of a data set. CRUDAW also selects the initial fuzzy membership degrees de...

متن کامل

GROUND MOTION CLUSTERING BY A HYBRID K-MEANS AND COLLIDING BODIES OPTIMIZATION

Stochastic nature of earthquake has raised a challenge for engineers to choose which record for their analyses. Clustering is offered as a solution for such a data mining problem to automatically distinguish between ground motion records based on similarities in the corresponding seismic attributes. The present work formulates an optimization problem to seek for the best clustering measures. In...

متن کامل

Effect of Harvesting Stages and Nitrogen on Seed Quality and Yield of Jute Mallow (Corchorus olitorius L.)

Production of high quality seeds in African leafy vegetables has not been practiced due to varying reasons including incorrect harvesting stages and fertilizer rates. Jute mallow (Corchorus olitorius L.) pods do not ripen simultaneously and fruits left to dry on mother plant long before harvesting, which face seed quality deterioration. Timely seed harvesting ensures maximum seed quality attrib...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011